Skip to content

Conversation

@xav-db
Copy link
Member

@xav-db xav-db commented Sep 15, 2025

Description

Related Issues

Closes #

Checklist when merging to main

  • No compiler warnings (if applicable)
  • Code is formatted with rustfmt
  • No useless or dead code (if applicable)
  • Code is easy to understand
  • Doc comments are used for all functions, enums, structs, and fields (where appropriate)
  • All tests pass
  • Performance has not regressed (assuming change was not to fix a bug)
  • Version number has been updated in helix-cli/Cargo.toml and helixdb/Cargo.toml

Additional Notes

@xav-db xav-db marked this pull request as ready for review September 15, 2025 00:47
Copilot AI review requested due to automatic review settings September 15, 2025 00:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds configurable vector similarity methods to the helix-db vector engine. Previously, the system was limited to cosine similarity with feature flags, but now supports configurable similarity methods including cosine distance, cosine similarity, and Euclidean distance.

  • Introduces a SimilarityMethod enum with three variants: CosineDistance, CosineSimilarity (default), and EuclideanDistance
  • Updates the vector distance calculation API to accept similarity method parameters
  • Removes the cosine feature flag and related conditional compilation directives

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
helix-db/src/helix_engine/traversal_core/config.rs Adds SimilarityMethod enum and integrates it into vector configuration
helix-db/src/helix_engine/vector_core/vector_distance.rs Updates distance calculation to support multiple similarity methods
helix-db/src/helix_engine/vector_core/vector_core.rs Integrates similarity method into VectorCore constructor and usage
helix-db/src/helix_engine/vector_core/vector.rs Updates HVector distance_to method to accept similarity method parameter
helix-db/Cargo.toml Removes cosine feature flag from feature definitions
Multiple test files Updates test calls to use new distance calculation API with similarity method

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

let c = x - y;
c * c
})
.sum())
Copy link

Copilot AI Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The euclidean_distance function returns the squared Euclidean distance, not the actual Euclidean distance. It should return the square root of the sum: Ok(from.iter().zip(to.iter()).map(|(x, y)| { let c = x - y; c * c }).sum::<f64>().sqrt())

Suggested change
.sum())
.sum::<f64>()
.sqrt())

Copilot uses AI. Check for mistakes.
pub mcp: Option<bool>,
pub bm25: Option<bool>,
pub schema: Option<String>,
pub embedding_model: Option<String>,
Copy link

Copilot AI Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of graphvis_node_label field appears throughout multiple files but the field is still referenced in line 64. This creates an inconsistent state where the field exists in some structs but not others.

Copilot uses AI. Check for mistakes.
xav-db added a commit that referenced this pull request Sep 20, 2025
…ods, v from id/type (#438)

## Description
<!-- Provide a brief description of the changes in this PR -->

## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #433 #436 #435 

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->
@xav-db xav-db mentioned this pull request Oct 29, 2025
8 tasks
xav-db added a commit that referenced this pull request Nov 7, 2025
## Description
<!-- Provide a brief description of the changes in this PR -->

## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #670 #666 #667 #672  #668 #661 #655 #654 #652 #436 

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

Updated On: 2025-11-07 00:19:04 UTC

<h3>Greptile Summary</h3>


This PR implements arena-based memory allocation for graph traversals
and refactors the worker pool's channel selection mechanism.

**Key Changes:**
- **Arena Implementation**: Introduced `'arena` lifetime parameter
throughout traversal operations (`in_e.rs`), replacing owned data with
arena-allocated references for improved memory efficiency
- **Worker Pool Refactor**: Replaced `flume::Selector` with a
parity-based `try_recv()`/`recv()` pattern to handle two channels
(`cont_rx` and `rx`) across multiple worker threads
- **Badge Addition**: Added Manta Graph badge to README

**Issues Found:**
- **Worker Pool Channel Handling**: The new parity-based approach
requires an even number of workers (≥2) and uses non-blocking
`try_recv()` followed by blocking `recv()` on alternating channels.
While this avoids a true busy-wait (since one `recv()` always blocks),
the asymmetry means channels are polled at different frequencies,
potentially causing channel starvation or unfair scheduling compared to
the previous `Selector::wait()` approach.

The arena implementation appears solid and follows Rust lifetime best
practices. The worker pool change seems to be addressing a specific
issue with core affinity (per commit `7437cf0f`), but the trade-off in
channel fairness should be monitored.

<details><summary><h3>Important Files Changed</h3></summary>



File Analysis



| Filename | Score | Overview |
|----------|-------|----------|
| README.md | 5/5 | Added Manta Graph badge to README - cosmetic
documentation change with no functional impact |
| helix-db/src/helix_engine/traversal_core/ops/in_/in_e.rs | 5/5 |
Refactored to use arena-based lifetimes ('arena) instead of owned data,
replacing separate InEdgesIterator struct with inline closures for
better memory management |
| helix-db/src/helix_gateway/worker_pool/mod.rs | 3/5 | Replaced flume
Selector with parity-based try_recv/recv pattern requiring even worker
count, but implementation has potential busy-wait issues that could
cause high CPU usage |

</details>


</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Client
    participant WorkerPool
    participant Worker1 as Worker (parity=true)
    participant Worker2 as Worker (parity=false)
    participant Router
    participant Storage

    Client->>WorkerPool: process(request)
    WorkerPool->>WorkerPool: Send request to req_rx channel
    
    par Worker1 Loop (parity=true)
        loop Every iteration
            Worker1->>Worker1: try_recv(cont_rx) - non-blocking
            alt Continuation available
                Worker1->>Worker1: Execute continuation function
            else Empty
                Worker1->>Worker1: Skip (no busy wait here)
            end
            Worker1->>Worker1: recv(rx) - BLOCKS until request
            alt Request received
                Worker1->>Router: Route request to handler
                Router->>Storage: Execute graph operation
                Storage-->>Router: Return result
                Router-->>Worker1: Response
                Worker1->>WorkerPool: Send response via ret_chan
            end
        end
    end
    
    par Worker2 Loop (parity=false)
        loop Every iteration
            Worker2->>Worker2: try_recv(rx) - non-blocking
            alt Request available
                Worker2->>Router: Route request to handler
                Router->>Storage: Execute graph operation
                Storage-->>Router: Return result
                Router-->>Worker2: Response
                Worker2->>WorkerPool: Send response via ret_chan
            else Empty
                Worker2->>Worker2: Skip (no busy wait here)
            end
            Worker2->>Worker2: recv(cont_rx) - BLOCKS until continuation
            alt Continuation received
                Worker2->>Worker2: Execute continuation function
            end
        end
    end

    WorkerPool-->>Client: Response
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants